Audio-Visual Speaker Veri cation using Continuous Fused HMMs
نویسندگان
چکیده
This paper examines audio-visual speaker veri cation using a novel adaptation of fused hidden Markov models, in comparison to output fusion of individual classi ers in the audio and video modalities. A comparison of both hidden Markov model (HMM) and Gaussian mixture model (GMM) classi ers in both modalities under output fusion shows that the choice of audio classi er is more important than video. Although temporal information allows a HMM to outperform a GMM individually in video, this temporal information does not carry through to output fusion with an audio classi er, where the di erence between the two video classi ers is minor. An adaptation of fused hidden Markov models, designed to be more robust to within-speaker variation, is used to show that the temporal relationship between video observations and audio states can be harnessed to reduce errors in audio-visual speaker veri cation when compared to output fusion.
منابع مشابه
A New Approach to Integrate Audio and Visual Features of Speech
This paper presents a novel fused-hidden Markov model (fused-HMM) to integrate the audio and visual features of speech. In this model, audio and visual HMMs built individually are fused together using a general probabilistic fusion method, which is optimal in the maximum entropy sense. Specifically, the fusion method uses the dependencies between the audio hidden states and the visual observati...
متن کاملTransition-oriented hidden Markov models for speaker verification
In this article, we present a novel mechanism by which more precise voiceprints can be constructed in a typical text-dependent speaker veri cation system based on a continuous density hidden Markov model (HMM). Typical voiceprints (speaker-dependent HMMs) are rst trained using a subscriber's enrollment data. The resulting models are then restructured to permit a modeling of sub-state behavior. ...
متن کاملAn Examination of Audio-visual Fused Hmms for Speaker Recognition
Fused hidden Markov models (FHMMs) have been shown to work well for the task of audio-visual speaker recognition, but only in an output decision-fusion configuration of both the audioand video-biased versions of the FHMM structure. This paper looks at the performance of the audioand video-biased versions independently, and shows that the audio-biased version is considerably more capable for spe...
متن کاملRobust speaker verification insensitive to session-dependent utterance variation and handset-dependent distortion
This paper investigates a method of creating robust speaker models that are not sensitive to session-dependent (SD) utterance-variation and handset-dependent (HD) distortion for hidden Markov model (HMM)-based speaker veri cation systems in a real telephone network. We recently reported a method of creating session-independent (SI) speaker-HMMs that are not sensitive to SD utterance-variation. ...
متن کاملFused HMM adaptation of synchronous HMMs for audio-visual speaker verification
A technique known as fused hidden Markov models (FHMMs) was recently proposed as an alternative multi-stream modelling technique for audio-visual speaker recognition. In this paper, we will show that instead of being treated as separate modelling technique, FHMMs can be adopted as a novel method of training synchronous hidden Markov models (SHMMs). SHMMs are traditionally jointly trained on bot...
متن کامل